{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xLOXFOT5Q40E"
      },
      "source": [
        "##### Copyright 2020 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "iiQkM5ZgQ8r2"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "j6331ZSsQGY3"
      },
      "source": [
        "# Barren plateaus"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "i9Jcnb8bQQyd"
      },
      "source": [
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://www.tensorflow.org/quantum/tutorials/barren_plateaus\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/quantum/blob/master/docs/tutorials/barren_plateaus.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/quantum/blob/master/docs/tutorials/barren_plateaus.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a href=\"https://storage.googleapis.com/tensorflow_docs/quantum/docs/tutorials/barren_plateaus.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DyEcfFapraq6"
      },
      "source": [
        "In this example you will explore the result of <a href=\"https://www.nature.com/articles/s41467-018-07090-4\" class=\"external\">McClean, 2019</a> that says not just any quantum neural network structure will do well when it comes to learning. In particular you will see that a certain large family of random quantum circuits do not serve as good quantum neural networks, because they have gradients that vanish almost everywhere. In this example you won't be training any models for a specific learning problem, but instead focusing on the simpler problem of understanding the behaviors of gradients."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zB_Xw0Y9rVNi"
      },
      "source": [
        "## Setup"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "TorxE5tnkvb2"
      },
      "outputs": [],
      "source": [
        "!pip install tensorflow==2.3.1"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FxkQA6oblNqI"
      },
      "source": [
        "Install TensorFlow Quantum:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "saFHsRDpkvkH"
      },
      "outputs": [],
      "source": [
        "!pip install tensorflow-quantum"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1PaclXeSrrMW"
      },
      "source": [
        "Now import TensorFlow and the module dependencies:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "enZ300Bflq80"
      },
      "outputs": [],
      "source": [
        "import tensorflow as tf\n",
        "import tensorflow_quantum as tfq\n",
        "\n",
        "import cirq\n",
        "import sympy\n",
        "import numpy as np\n",
        "\n",
        "# visualization tools\n",
        "%matplotlib inline\n",
        "import matplotlib.pyplot as plt\n",
        "from cirq.contrib.svg import SVGCircuit\n",
        "\n",
        "np.random.seed(1234)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b08Mmbs8lr81"
      },
      "source": [
        "## 1. Summary\n",
        "\n",
        "Random quantum circuits with many blocks that look like this ($R_{P}(\\theta)$ is a random Pauli rotation):<br/>\n",
        "<img src=\"./images/barren_2.png\" width=700>\n",
        "\n",
        "Where if $f(x)$ is defined as the expectation value w.r.t. $Z_{a}Z_{b}$ for any qubits $a$ and $b$, then there is a problem that $f'(x)$  has a mean very close to 0 and does not vary much. You will see this below:"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "y31qSRCczI-L"
      },
      "source": [
        "## 2. Generating random circuits\n",
        "\n",
        "The construction from the paper is straightforward to follow. The following implements a simple function that generates a random quantum circuit—sometimes referred to as a *quantum neural network* (QNN)—with the given depth on a set of qubits:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Nh9vrgPBks7O"
      },
      "outputs": [],
      "source": [
        "def generate_random_qnn(qubits, symbol, depth):\n",
        "    \"\"\"Generate random QNN's with the same structure from McClean et al.\"\"\"\n",
        "    circuit = cirq.Circuit()\n",
        "    for qubit in qubits:\n",
        "        circuit += cirq.ry(np.pi / 4.0)(qubit)\n",
        "\n",
        "    for d in range(depth):\n",
        "        # Add a series of single qubit rotations.\n",
        "        for i, qubit in enumerate(qubits):\n",
        "            random_n = np.random.uniform()\n",
        "            random_rot = np.random.uniform(\n",
        "            ) * 2.0 * np.pi if i != 0 or d != 0 else symbol\n",
        "            if random_n > 2. / 3.:\n",
        "                # Add a Z.\n",
        "                circuit += cirq.rz(random_rot)(qubit)\n",
        "            elif random_n > 1. / 3.:\n",
        "                # Add a Y.\n",
        "                circuit += cirq.ry(random_rot)(qubit)\n",
        "            else:\n",
        "                # Add a X.\n",
        "                circuit += cirq.rx(random_rot)(qubit)\n",
        "\n",
        "        # Add CZ ladder.\n",
        "        for src, dest in zip(qubits, qubits[1:]):\n",
        "            circuit += cirq.CZ(src, dest)\n",
        "\n",
        "    return circuit\n",
        "\n",
        "\n",
        "generate_random_qnn(cirq.GridQubit.rect(1, 3), sympy.Symbol('theta'), 2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gUuQfOyrj_Hu"
      },
      "source": [
        "The authors investigate the gradient of a single parameter $\\theta_{1,1}$. Let's follow along by placing a `sympy.Symbol` in the circuit where $\\theta_{1,1}$ would be. Since the authors do not analyze the statistics for any other symbols in the circuit, let's replace them with random values now instead of later."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lAVDRQ87k3md"
      },
      "source": [
        "## 3. Running the circuits\n",
        "\n",
        "Generate a few of these circuits along with an observable to test the claim that the gradients don't vary much. First, generate a batch of random circuits. Choose a random *ZZ* observable and batch calculate the gradients and variance using TensorFlow Quantum."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qoDDaHgwj_Hz"
      },
      "source": [
        "### 3.1 Batch variance computation\n",
        "\n",
        "Let's write a helper function that computes the variance of the gradient of a given observable over a batch of circuits:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "OkdndnBKk8B8"
      },
      "outputs": [],
      "source": [
        "def process_batch(circuits, symbol, op):\n",
        "    \"\"\"Compute the variance of a batch of expectations w.r.t. op on each circuit that \n",
        "    contains `symbol`. Note that this method sets up a new compute graph every time it is\n",
        "    called so it isn't as performant as possible.\"\"\"\n",
        "\n",
        "    # Setup a simple layer to batch compute the expectation gradients.\n",
        "    expectation = tfq.layers.Expectation()\n",
        "\n",
        "    # Prep the inputs as tensors\n",
        "    circuit_tensor = tfq.convert_to_tensor(circuits)\n",
        "    values_tensor = tf.convert_to_tensor(\n",
        "        np.random.uniform(0, 2 * np.pi, (n_circuits, 1)).astype(np.float32))\n",
        "\n",
        "    # Use TensorFlow GradientTape to track gradients.\n",
        "    with tf.GradientTape() as g:\n",
        "        g.watch(values_tensor)\n",
        "        forward = expectation(circuit_tensor,\n",
        "                              operators=op,\n",
        "                              symbol_names=[symbol],\n",
        "                              symbol_values=values_tensor)\n",
        "\n",
        "    # Return variance of gradients across all circuits.\n",
        "    grads = g.gradient(forward, values_tensor)\n",
        "    grad_var = tf.math.reduce_std(grads, axis=0)\n",
        "    return grad_var.numpy()[0]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JINYTIjDj_H1"
      },
      "source": [
        "### 3.1 Set up and run\n",
        "\n",
        "Choose the number of random circuits to generate along with their depth and the amount of qubits they should act on. Then plot the results."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "xAGBcq9Bj_H3"
      },
      "outputs": [],
      "source": [
        "n_qubits = [2 * i for i in range(2, 7)\n",
        "           ]  # Ranges studied in paper are between 2 and 24.\n",
        "depth = 50  # Ranges studied in paper are between 50 and 500.\n",
        "n_circuits = 200\n",
        "theta_var = []\n",
        "\n",
        "for n in n_qubits:\n",
        "    # Generate the random circuits and observable for the given n.\n",
        "    qubits = cirq.GridQubit.rect(1, n)\n",
        "    symbol = sympy.Symbol('theta')\n",
        "    circuits = [\n",
        "        generate_random_qnn(qubits, symbol, depth) for _ in range(n_circuits)\n",
        "    ]\n",
        "    op = cirq.Z(qubits[0]) * cirq.Z(qubits[1])\n",
        "    theta_var.append(process_batch(circuits, symbol, op))\n",
        "\n",
        "plt.semilogy(n_qubits, theta_var)\n",
        "plt.title('Gradient Variance in QNNs')\n",
        "plt.xlabel('n_qubits')\n",
        "plt.ylabel('$\\\\partial \\\\theta$ variance')\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qY2E0CFjxRE9"
      },
      "source": [
        "This plot shows that for quantum machine learning problems, you can't simply guess a random QNN ansatz and hope for the best. Some structure must be present in the model circuit in order for gradients to vary to the point where learning can happen."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4RE_idhmj_H6"
      },
      "source": [
        "## 4. Heuristics\n",
        "\n",
        "An interesting heuristic by <a href=\"https://arxiv.org/pdf/1903.05076.pdf\" class=\"external\">Grant, 2019</a> allows one to start very close to random, but not quite. Using the same circuits as McClean et al., the authors propose a different initialization technique for the classical control parameters to avoid barren plateaus. The initialization technique starts some layers with totally random control parameters—but, in the layers immediately following, choose parameters such that the initial transformation made by the first few layers is undone. The authors call this an *identity block*.\n",
        "\n",
        "The advantage of this heuristic is that by changing just a single parameter, all other blocks outside of the current block will remain the identity—and the gradient signal comes through much stronger than before. This allows the user to pick and choose which variables and blocks to modify to get a strong gradient signal. This heuristic does not prevent the user from falling in to a barren plateau during the training phase (and restricts a fully simultaneous update), it just guarantees that you can start outside of a plateau."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Fofv9hgyj_IB"
      },
      "source": [
        "### 4.1 New QNN construction\n",
        "\n",
        "Now construct a function to generate identity block QNNs. This implementation is slightly different than the one from the paper. For now, look at the behavior of the gradient of a single parameter so it is consistent with McClean et al, so some simplifications can be made.\n",
        "\n",
        "To generate an identity block and train the model, generally you need $U1(\\theta_{1a}) U1(\\theta_{1b})^{\\dagger}$ and not $U1(\\theta_1) U1(\\theta_1)^{\\dagger}$. Initially $\\theta_{1a}$ and $\\theta_{1b}$ are the same angles but they are learned independently. Otherwise, you will always get the identity even after training. The choice for the number of identity blocks is empirical. The deeper the block, the smaller the variance in the middle of the block. But at the start and end of the block, the variance of the parameter gradients should be large. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "PL7mTHEVj_IC"
      },
      "outputs": [],
      "source": [
        "def generate_identity_qnn(qubits, symbol, block_depth, total_depth):\n",
        "    \"\"\"Generate random QNN's with the same structure from Grant et al.\"\"\"\n",
        "    circuit = cirq.Circuit()\n",
        "\n",
        "    # Generate initial block with symbol.\n",
        "    prep_and_U = generate_random_qnn(qubits, symbol, block_depth)\n",
        "    circuit += prep_and_U\n",
        "\n",
        "    # Generate dagger of initial block without symbol.\n",
        "    U_dagger = (prep_and_U[1:])**-1\n",
        "    circuit += cirq.resolve_parameters(\n",
        "        U_dagger, param_resolver={symbol: np.random.uniform() * 2 * np.pi})\n",
        "\n",
        "    for d in range(total_depth - 1):\n",
        "        # Get a random QNN.\n",
        "        prep_and_U_circuit = generate_random_qnn(\n",
        "            qubits,\n",
        "            np.random.uniform() * 2 * np.pi, block_depth)\n",
        "\n",
        "        # Remove the state-prep component\n",
        "        U_circuit = prep_and_U_circuit[1:]\n",
        "\n",
        "        # Add U\n",
        "        circuit += U_circuit\n",
        "\n",
        "        # Add U^dagger\n",
        "        circuit += U_circuit**-1\n",
        "\n",
        "    return circuit\n",
        "\n",
        "\n",
        "generate_identity_qnn(cirq.GridQubit.rect(1, 3), sympy.Symbol('theta'), 2, 2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ifWrl19kj_IG"
      },
      "source": [
        "### 4.2 Comparison\n",
        "\n",
        "Here you can see that the heuristic does help to keep the variance of the gradient from vanishing as quickly:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "62kmsVAXj_IH"
      },
      "outputs": [],
      "source": [
        "block_depth = 10\n",
        "total_depth = 5\n",
        "\n",
        "heuristic_theta_var = []\n",
        "\n",
        "for n in n_qubits:\n",
        "    # Generate the identity block circuits and observable for the given n.\n",
        "    qubits = cirq.GridQubit.rect(1, n)\n",
        "    symbol = sympy.Symbol('theta')\n",
        "    circuits = [\n",
        "        generate_identity_qnn(qubits, symbol, block_depth, total_depth)\n",
        "        for _ in range(n_circuits)\n",
        "    ]\n",
        "    op = cirq.Z(qubits[0]) * cirq.Z(qubits[1])\n",
        "    heuristic_theta_var.append(process_batch(circuits, symbol, op))\n",
        "\n",
        "plt.semilogy(n_qubits, theta_var)\n",
        "plt.semilogy(n_qubits, heuristic_theta_var)\n",
        "plt.title('Heuristic vs. Random')\n",
        "plt.xlabel('n_qubits')\n",
        "plt.ylabel('$\\\\partial \\\\theta$ variance')\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "E0XNSoblj_IK"
      },
      "source": [
        "This is a great improvement in getting stronger gradient signals from (near) random QNNs."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "barren_plateaus.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
